Skip to main content

All Questions

0votes
1answer
164views

Reward not improving for a custom environment using PPO

I've been trying to train an agent on a custom environment I implemented with gym where the goal is to resolve voltage violations in a power grid by adjusting the active power (loads) at each node. I ...
W8_4_it's user avatar
0votes
1answer
290views

Why is PPO not choosing a solution that is giving a higher cumulative reward?

I use PPO to train my fermenter (digital twin) to maximize enzyme (product) production. action: 1 or 0 ie. add substrate at a particular time or not based on cell and enzymes present in the tank ...
user79474's user avatar
1vote
1answer
478views

Getting always the same action on an A2C from stable_baselines3

I'm quite new to RL and have been trying to train an A2C model from stable_baselines3 to derive an integer sequence based on 3 other input sequences of floats. I have a custom gym environment that ...
Jesuspc's user avatar

close